Predicting Programming Community Popularity on StackOverflow from Initial Affiliation Networks
نویسندگان
چکیده
StackOverflow has become a popular question and answer site for programmers since its launch in 2008. As new programming frameworks and languages emerge, StackOverflow communities form around the new tags to ask and answer questions. Our analysis investigated both the a liation network between tags across the lifetime of StackOverflow, as well as the relationship between the initial a liation networks for these programming tags during the first 4 weeks of their existence and how these initial structures predict future tag popularity in November 2013. Thus far, much of the literature on groups within social networks has focused on static properties such as community detection and clustering, rather than predicting properties over time. Even recent work in this area primarily considers the relations between group members and how these properties relate to the group’s evolution. For example, Backstrom et al found that groups in social graphs with a large number of triangles grow much less quickly than groups with a small number [4] while Kairam et al discovered that the class of growth a group experiences can predict its longevity [11]. Meanwhile, Zheleva et al. and Geard et al. studied the relationship over time between a liation networks and their underlying social networks, where the a liation networks form via links between users and subgroups while the social networks involve direct links between users. In general, however, research on the evolution of a liation networks themselves – particularly when they exist independently of an underlying social network – has been sparse. We considered the a liation network over time between tags, looking at the relationships between groups rather than group members. Because StackOverflow lacks formalized “friendships” between users, the a liation network forms without the influence of an explicit social network. StackOverflow has over 2 million users and 6 million questions as of November 2013 [2], and past work shows that roughly three-fourths of them have contributed to at least one post [12]. Yet, unlike traditional social networks where groups form directly from lists of members, StackOverflow communities arise from the tags that programmers use to label questions. These tags center around specific programming languages, frameworks, and platforms. We leveraged this tag data both to form links in the a liation network via co-occurring tags, as well as to form links based on user participation in posts across di↵erent tags. We primarily modeled these tag communities as a modified folded bipartite a liation network, as further discussed in Section 3. Communities represented by tags form the nodes, while the number of engaged members shared between two StackOverflow tags determines the graph’s weighted edges. We analyze this a liation network for the all-time activity on StackOverflow. Next, for each of the top 1000 tags, we consider the tag’s activity and relationships in the a liation network from the first 28 days of the tag’s life on StackOverflow, measured from when the first post with the tag appears. We classify each tag as “more popular” (top 500 tags in November 2013) or “less popular” (top 500 to 1000 tags) where the rank is based on the cumulative number of questions relative to other tags. We compute features including degree centrality, closeness centrality, average shortest path, clustering coe cient, and page rank based on the a liation network generated from user activity, as well as a simpler a liation network from co-occurring tags. We apply Random Forest Classifiers, Linear SVCs, Logistic Regression, and AdaBoost Classifiers to predict tag success, showing the importance of a tag’s place in the initial tag a liation network. In all, our results suggest that the initial a liation network around a tag is more indicative of later success than metrics on initial activity. We begin with a survey of prior work surrounding groups and evolution of networks in Section 2, followed by a discussion of our data collection process, network modeling, and features in Section 3. Next we share the analysis and predictive results in Section 4, before concluding and discussing future work in Section 5.
منابع مشابه
Predicting Tags for StackOverflow Questions
We present a system that is able to automatically assign tags to questions from the question-answering site StackOverflow. Our system consists of a programming language detection system and a SVM using content-based features. When testing on an unseen test set, we achieve a mean F1 of 0.41 on this task.
متن کاملAffiliation Influence on Recommendation in Academic Social Networks
Social networks have been the focus of many studies, from communities’ identification to link prediction. Here, we propose a method based on researchers’ institution affiliation for predicting links in a collaboration social network. Initial experiments show that considering the institution affiliation aspect, the set of recommendations is more accurate and concise, leading to a more efficient ...
متن کاملMen at work: the StackOverflow case
Online communities are flourishing as social meeting web-spaces for users and peer community members. Different online communities require different levels of competence for participants to join, and scattered evidence suggests that the female gender and minorities can be under-represented. Here we focus on the popular programming-related Q&A website StackOverflow. StackOverflow is based on ear...
متن کاملPredicting the quality of questions on Stackoverflow
Community Question Answering websites (CQA) have a growing popularity as a way of providing and searching of information. CQA attract users as they provide a direct and rapid way to find the desired information. As recognizing good questions can improve the CQA services and the user’s experience, the current study focuses on question quality instead. Specifically, we predict question quality an...
متن کاملSoftware.zhishi.schema: A Software Programming Taxonomy Derived from Stackoverflow
In this paper, we are the first to construct a software programming taxonomy from Stackoverflow. More precisely, we propose a machine learning based method with novel features to capture the hierarchical semantic structure of tags in Stackoverflow. A graph pruning algorithm is applied to eliminate the conflicts by constructing a Directed Acyclic Graph (DAG). As a result, our dataset, named Soft...
متن کامل